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A model for network panel data is discussed, based on the as- 
sumption that the observed data are discrete observations of a 
continuous-time Markov process on the space of all directed graphs 
on a given node set, in which changes in tie variables are indepen- 
dent conditional on the current graph. The model for tie changes is 
parametric and designed for applications to social network analysis, 
where the network dynamics can be interpreted as being generated 
by choices made by the social actors represented by the nodes of 
the graph. An algorithm for calculating the Maximum Likelihood 
estimator is presented, based on data augmentation and stochastic 
approximation. An application to an evolving friendship network is 
given and a small simulation study is presented which suggests that 
for small data sets the Maximum Likelihood estimator is more effi- 
cient than the earlier proposed Method of Moments estimator. 

1. Introduction. Relations between social actors can be studied by meth- 
ods of social network analysis [e.g., Wasserman and Faust (1994); Carring- 
ton, Scott and Wasserman (2005)]. Examples are friendship between pupils 
in a school class or alliances between firms. A basic data structure for social 
networks is the directed graph or digraph, where the actors are represented 
by the nodes, and the arcs between the nodes indicate the social ties. Social 
network analysis traditionally has had a focus on rich description of net- 
work data, but the recent development of methods of statistical inference 
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for network data [e.g., Airoldi et al. (2007); Hunter and Handcock (2006)] 
has the potential of moving this field toward a wider use of inferential ap- 
proaches. Longitudinal studies are especially important for obtaining insight 
into social networks, but statistical methods for longitudinal network data 
that are versatile enough for realistic modeling are only just beginning to 
be developed. 

This article considers repeated observations of a relation, or network, on 
a given set of actors TV = {1, . . . , n}, observed according to a panel design, 
and represented as a sequence of digraphs x(t m ) for m = 1, . . . , M, where 
t± < ■ ■ ■ < tM are the observation moments. The nodes represent social ac- 
tors (which may be individuals, companies, etc.), and the node set A^ is the 
same for all observation moments. A digraph is defined here as an irrenexive 
relation, that is, a subset x of {(i,j) € M 2 \ i ^ j}, and when (i,j) S x we 
shall say that there is an arc, or a tie, from i to j. It often is realistic to 
assume that the social network has been developing between the observa- 
tion moments, which leads to the assumption that the observations x(t m ) 
are realizations of stochastic digraphs X(t m ) embedded in a continuous-time 
stochastic process X(t), t\ < t < tM- Holland and Leinhardt (1977) proposed 
to use continuous-time Markov chains, defined on the space of all digraphs 
with a given node set, for modeling social network dynamics even if the 
observations are made at a few discrete time points and not continuously. 
Continuous-time Markov chains provide a natural starting point for model- 
ing longitudinal network data. Wasserman (1979, 1980) and Leenders (1995) 
elaborated dyad-independent models, where the ties between pairs of actors 
(dyads) develop according to processes that are mutually independent be- 
tween dyads. This is not realistic for social processes, because dependence 
between the set of ties among three or more actors can be very strong, as 
was found already by Davis (1970) who showed that a basic feature of many 
social networks is the tendency toward transitivity ("friends of my friends 
are my friends"). Snijders and van Duijn (1997) and Snijders (2001) pro- 
posed so-called actor- oriented models, explained in the next section, which 
do allow such higher-order dependencies. 

The actor-oriented models are too complicated for the calculation of like- 
lihoods or estimators in closed form, but they represent stochastic processes 
which can be easily simulated. This was exploited by the estimation method 
proposed in the papers mentioned, which is a Method of Moments (MoM) es- 
timator implemented algorithmically by stochastic approximation [Robbins 
and Monro (1951); also see, e.g., Kushner and Yin (2003)]. This estimator 
usually performs well [some empirical applications are given in de Federico 
(2003), van de Bunt, van Duijn and Snijders (1999), and van Duijn et al. 
(2003)]. 

It is to be expected, however, that the statistical efficiency of the Maxi- 
mum Likelihood (ML) estimator will be greater. ML estimation also paves 
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the way for likelihood-based model selection, which will be a marked im- 
provement on existing methods of model selection. This article presents an 
MCMC algorithm for approximating the ML estimator, combining and ex- 
tending ideas in Gn and Kong (1998), Snijders (2001), and Koskinen and 
Snijders (2007), the latter of which proposed for this model a MCMC algo- 
rithm for Bayesian inference. Section 2 presents the model definition. The al- 
gorithm for obtaining the ML estimator is described in Section 3. Section 4 
reports results of an empirical example and Section 5 presents a very small 
simulation study comparing the ML and MoM estimators. The paper fin- 
ishes with an algorithm for approximating the likelihood ratio test in Section 
6 and a discussion in Section 7. 

2. Model definition. We assume that repeated observations x(t±),..., 
x(tM) ° n the network are available for some M > 2. The network, or digraph, 
x will be identified with its n x n adjacency matrix, of which the elements 
denote whether there is a tie from node i to node j (xij = 1) or not (xij = 0). 
Self-ties are not allowed, so that the diagonal is structurally zero. Random 
variables are denoted by capitals. The stochastic process X(t), in which the 
observations x(t m ) are embedded, is modeled as being right-continuous. 

Various models have been proposed, most of them being Markov pro- 
cesses of some kind. We focus on actor-oriented models [Snijders (2001)]. 
Tie-oriented models [Snijders (2006)] can be treated similarly. The basic 
idea of actor-oriented models [Snijders (2001)] is that the nodes of the graph 
represent social actors, who have control, albeit under constraints, of their 
outgoing ties; and the graph develops as a continuous-time Markov process 
(even though it is observed only at M discrete time points). The constraints 
are that ties may change only one by one, and actors do not coordinate 
their changes of ties. Thus, at any given moment, one actor i may create 
one new tie or delete one existing tie, where the probability distribution of 
such changes depends on the current digraph; excluded are simultaneous 
changes such as swapping one tie for another, or bargaining between actors 
over ties. This constraint was proposed already by Holland and Leinhardt 
(1977), and it has the virtue of splitting up the change process in its smallest 
possible constituents. 

The actor-oriented model is further specified as follows; further discussion 
and motivation is given in Snijders (2001). 

1. Opportunities for change 

Each actor i gets at stochastically determined moments the opportunity 
to change one of the outgoing tie variables Xij(t) (j G A/", j ^ i). Since 
the process is assumed to be Markovian, waiting times between opportuni- 
ties have exponential distributions. Each of the actors i has a rate function 
Xi(a,x) which defines how quickly this actor gets an opportunity to change 
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a tie variable, when the current value of the digraph is x, and where a is 
a parameter. At any time point t with X(t) = x, the waiting time until the 
next opportunity for change by any actor is exponentially distributed with 
parameter 

(1) A(a, x) = \{ot, x). 

i 

Given that an opportunity for change occurs, the probability that it is actor 
i who gets the opportunity is given by 

(2) in(a,x) = \i(a,x)/\(a,x). 

The rate functions can be constant between observation moments, or they 
can depend on functions rn-(x) which may be covariates or positional char- 
acteristics of the actors such as outdegrees j x ij • A convenient assumption 
is to use an exponential link function, 

(3) Xi(a,x) = exp(^2a k r ik (x)^J. 

2. Options for change 

When actor i gets the opportunity to make a change, this actor has a 
permitted set Ai(x°) of values to which the digraph may be changed, where 
x° is the current value of the digraph. The assumption that the actor controls 
his or her outgoing ties, but can change only one tie variable at the time, is 
equivalent to 

(4a) A(x°)c{x }U^(A 

where A\(x®) is the set of adjacency matrices differing from x in exactly one 
element, 

A\(x G ) = {x | Xij = 1 — xfj for one j / i, 

(4b) 

and Xhk = %hk f° r & U other (h, k)}. 

Including x° in Ai(x°) can be important for expressing the property that 
actors who are satisfied with the current network will prefer to keep it un- 
changed. Therefore, the usual model is Ai(x°) = {x } UAl(x°). This does 
not lead to identifiability problems because the ratio between not making a 
change and making a change is not a free parameter, but fixed by assumption 
(5) for the conditional probabilities of the new state of the network. 

Some alternatives are models with structurally impossible ties, where the 
impossible digraphs are left out of Ai(x°), and models where the actor is 
required to make a change whenever there is the opportunity, obtained by 
leaving the current element x° out of Ai(x°). 
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It is assumed that the network dynamics is driven by a so-called objective 
function fi(/3,x ,x) that can be interpreted as the relative attractiveness 
for actor i of moving from the network represented by x° to the network x, 
and where (3 is a parameter. Under the condition that the current digraph 
is x° and actor i gets the opportunity to make a change, the conditional 
probability that the next digraph is x is modeled as 

fexp(/ i (/3,x°,x))/Ve X p(/ l (/3, 2 ; ,x)), xeAi(x°), 
(5) Pi ((3,x°,x) = I i 

[o, x^Ai(x°), 

where the summation extends over x € Ai(x°). This formula can be mo- 
tivated by a random utility argument as used in econometrics [see, e.g., 
Maddala (1983)], where it is assumed that the actor maximizes fi((3,x°,x) 
plus a random disturbance having a standard Gumbel distribution. Assump- 
tion (4) implies that instead of Pi(/3, x°, x) we can also write Pij(/3, x ) where 
the correspondence between x and j is defined as follows: if x 7^ x°, j is the 
unique element of J\f for which Xij ^ x?-; if x = x°, j = i. This less redundant 
notation will be used in the sequel. Thus, for j ^ i, pij(f3,x°) is the prob- 
ability that, under the condition that actor i has the opportunity to make 
a change and the current digraph is x , the change will be to Xij = 1 — x\^ 
with the rest unchanged, while pu((3,x°) is the probability that, under the 
same condition, the digraph will not be changed. 

The most usual models are based on objective functions that depend on 
x only. This has the interpretation that actors wish to maximize a function 
fi(P,x) independently of "where they come from." The greater generality of 

(5) , where the objective function can depend also on the previous state x°, 
makes it possible to model path-dependencies, or hysteresis, where the loss 
suffered from withdrawing a given tie differs from the gain from creating 
this tie, even if the rest of the network has remained unchanged. 

Various ingredients for specifying the objective function were proposed in 
Snijders (2001). A linear form is convenient, 

L 

(6) fi(P,x°,x) = ^2p k s ik (x°,x), 

k=l 

where the functions Sik(x°,x) are determined by subject-matter knowledge 
and available social scientific theory. These functions can represent essential 
aspects of the network structure, assessed from the point of view of actor i, 
such as 



(7) s ik (x°,x) = ^2xij 
j 
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they can also depend on covariates — such as resources and preferences of 
the actors, or costs of exchange between pairs of actors — or combinations 
of network structure and covariates. For example, de Federico (2003) found 
in a study of friendship between foreign exchange students that friendship 
formation tends to be reciprocal, with a negative parameter for forming 
indirect ties (thus leading to relatively closed networks), and that friendships 
are more likely to be formed between individuals from the same region (a 
covariate effect), but that reciprocation adds less to the tendency to form ties 
between persons from the same region than between arbitrary individuals 
(negative interaction between covariate and reciprocity). 

3. Intensity matrix; time-homogeneity 

The model description given above defines X(t) as a continuous-time 
Markov process with Q-matrix or intensity matrix [e.g., Norris (1997)] for 
x ^ x° given by 



The assumptions do not imply that the distribution of X(t) is stationary. 
The intensity matrix is time- homogeneous, however, except for time depen- 
dence reflected by time- varying components in the functions s^^x , x). 

For the data-collection designs to which this paper is devoted, where ob- 
servations are done at discrete time points ti, . . - , tjvf 3 these time points can 
be used for marking time-heterogeneity of the transition distribution (cf. the 
example in Section 6). For example, covariates may be included with values 
allowed to change at the observation moments. A special role is played here 
by the time durations t m — t m -\ between successive observations. Standard 
theory for continuous-time Markov chains [e.g., Norris (1997)] shows that 
the matrix of transition probabilities from X(t m ) to X(t m -i) is e ( i ™- i ™-i)Q ) 
where Q is the matrix with elements (12). Thus, changing the duration 
t m —tm-i can De compensated by multiplication of the rate function Xi(a,x) 



(12) 
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by a constant. Since the connection between an externally denned real- valued 
time variable t m and the rapidity of network change is tenuous at best, it 
usually is advisable [e.g., see Snijders (2001)] to include a multiplicative pa- 
rameter in the rate function which is constant between observation moments 
t m but differs freely between periods (t m -i,t m ). With the inclusion of such 
a parameter, the numerical values t m become unimportant because changes 
in their values can be compensated by the multiplicative rate parameters. 

2.1. Comparison with discrete-time Markov chain models. Popular mod- 
els for analyzing cross-sectional network data are exponential families of 
graph distributions such as the Markov model of Frank and Strauss (1986), 
generalized to the p* Exponential Random Graph (ERG) model by Frank 
(1991) and Wasserman and Pattison (1996), and elaborated and further 
specified by Hunter and Handcock (2006) and Snijders et al. (2006). Discrete- 
time Markov chain models, as longitudinal models of this kind, were pro- 
posed by Robins and Pattison (2001), Krackhardt and Handcock (2007), and 
Hanneke and Xing (2007). There are a number of essential differences be- 
tween these models and the actor-oriented model treated here, with respect 
to interpretation as well as statistical procedures. 

When applying the actor-oriented model to a sequence of two or more re- 
peated observations, these observations are embedded in a continuous-time 
model. This has clear face validity in applications where changes in the net- 
work take place at arbitrary moments between observations. Singer (2008) 
gives an overview of the use of this principle for continuously distributed 
data. This approach yields a major advantage over the cited longitudinal 
ERG models: our probability model is defined independently of the obser- 
vational design. Data from irregularly spaced observation intervals can be 
analyzed without the need to make any adaptations. Another advantage is 
that the dynamic process is defined parsimoniously as a function of its el- 
ementary constituents — in this case, the conditional probability of a single 
tie change. The random utility interpretation of (5), discussed above, gives 
an interpretation in terms of myopic optimization of an objective function 
to which a random disturbance is added, and which can be used to represent 
a "social mechanism" that could have given rise to the observed dynamics. 

The discrete-time ERG models constitute exponential families, which 
has the advantage that many standard techniques can be directly applied. 
The distribution of the continuous-time process {X(t),ti <t< tu} for the 
actor-oriented model constitutes an exponential family, so for discrete ob- 
servation moments our model can be regarded as an incompletely observed 
exponential family. For both types of model, inference is computer-intensive 
and time-consuming because of the need to implement elaborate MCMC 
procedures. The definition of the model given above can be used directly 
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to simulate data from the probability distribution, conditional on an ini- 
tial state of the network. This contrasts with models in the ERG family, 
which can be simulated only indirectly, by regarding them as the station- 
ary distribution of an auxiliary Markov chain, and applying a Gibbs or 
Metropolis-Hastings algorithm. The near degeneracy problem [Snijders et 
al. (2006)] which plagues some specifications of ERG models is, in practice, 
not a problem for the actor-oriented model. 

3. ML estimation. This section presents an algorithm for MCMC ap- 
proximation of the maximum likelihood estimate. Estimation is done condi- 
tional on the first observation x(t\). This has the advantage that no model 
assumptions need to be invoked concerning the probability distribution that 
may have led to the first observed network x(ti), and the estimated param- 
eters refer exclusively to the dynamics of the network. 

The algorithm proposed below can be sketched as follows. For each m 
(m = 2, . . . , M) the observed data are augmented with random draws from 
the sequence of intermediate digraphs x(t) that could have led from one 
observation, x(i m _i), to the next, x(t m ) (Section 3.1). These draws can be 
simulated using a Metropolis-Hastings algorithm (Section 3.3). They are 
then used in the updating steps of a Robbins-Monro algorithm, following 
ideas of Gu and Kong (1998), to find the solution of the likelihood equation 
(Section 3.2). These elements are put together in Section 3.4. 

3.1. Augmented data. The likelihood for the observed data x(t<}),..., 
x(tM) conditional on x(ti) cannot generally be expressed in a computable 
form. Therefore, the observed data will be augmented with data such that 
an easily computable likelihood is obtained, employing the general data aug- 
mentation principle proposed by Tanner and Wong (1987). The data aug- 
mentation can be done for each period (t m -i,t m ) separately and, therefore, 
this section considers only the observations x{t\) and xfo). 

Denote the time points of an opportunity for change by T r and their total 
number between t\ and t<i by R, the time points being ordered increasingly 
so that t\ = To < T\ < T2 < ■ ■ ■ < Tr < £2- The model assumptions imply that 
at each time T r , there is one actor, denoted I r , who gets an opportunity for 
change at this time moment. Define J r as the actor toward whom the tie 
variable is changed, and define formally J r = I r if there is no change [i.e., if 
x(T r ) = x(T r _i)]: 

(I r ,J r ) is the only for which Xij(T r _\) 7^ 

Xij(T r ) if there is such an {i, j); else J r = I r . 

Given x(ti), the outcome of the stochastic process (T r ,I r , J r ), r = 1, . . . ,R 
completely determines x{t),t\ < t < £2- 
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The augmenting data consist of R and (I r , J r ), r = 1, . . . , R, without the 
time points T r . It may be noted that this differs from the definition of a 
sample path in Koskinen and Snijders (2007) in that the times in between 
opportunities for change are integrated out; the reason is to obtain a sim- 
pler MCMC algorithm. The possible outcomes of the augmenting data are 
determined by the condition 

even, if Xij(t 2 ) = Xijih), 



(13) t{r\l<r<R,(i r J r ) = (iJ)} 



odd, if Xij(t 2 ) ^Xijih 



for all with i ^ j. The stochastic process V = ((I r , J r ), r = 1, . . . , R) will 
be referred to as the sample path; the elements for which I r = J r , although 
redundant to calculate x(t 2 ) from x(t\), are retained because they facilitate 
the computation of the likelihood. Define x^ = x(Tj.); the digraphs x^ and 
x {k-i) differ i n element (i^, </&) provided Ik 7^ Jk-, an d in no other elements. 
The probability function of the sample path, conditional on x(ti), is given 

by 

p sp {V = ((n, ji), . . . , (i R ,j R ));a, P} 

(14) = P a ^{T R <t 2 < TR+^xW^iuh), (i R ,j R )} 

R 

xII T v(^ (r " 1) K*(Ai (r " 1) ) I 

r=l 

where 7Tj is defined in (2) and pij in and just after (5). Denote the first 
component of (14) by 

K(a,x {0 \(i 1 ,j 1 ),...,(i R ,j R )) 

(15) 

= P a ,/3{T R <t 2 < T R+ i|x (0) , (h,ji), (i R ,j R )}- 

Conditioning on x^°\ (i 2 ,j 2 ), • • • , [and not on x(t 2 )\], the differences 

T r+ \ — T r are independently exponentially distributed with parameters 
X(a,x^). Hence, under this conditioning the distribution of T R — t\ is 
the convolution of exponential distributions with parameters X(a,x^) for 
r = 0, . . . , R — 1. In the special case that the actor- level rates of change 
Xi(a,x) are constant, denoted by a±, R has a Poisson distribution with 
parameter na±(t 2 —t\); (15) then is given by 

, . , (n\ . . . . .. ... , . (noi\{t 2 — ti))^ 

(16) K(a,x^>,^ 1 ,ji),...,{i R ,j R ))=exp(-nai(t 2 - h)) — . 

In the general case where the change rates are nonconstant, the probability 

(15) can be approximated as follows. Denote by px R (t) the density function of 
T R , conditional on r ', (h,ji), (i 2 ,j 2 ), ■ ■ ■ , or, equivalently, on the embedded 
process x@' , x^, x( 2 \ The distribution of T R — t\ is a convolution of 
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exponential distributions and, therefore, the central limit theorem implies 
that the density function pr R (f) is approximately the normal density with 
expected value 

R-l 

(17) ^ a = ^{A(a,xM)}- 1 

r=0 

and variance 

(18) a 2 a = J2{X(a,x^)r 2 . 



r=Q 



Hence, 



(19) p TR (t) W - 1 expf . 

Probability (15) now can be expressed as 
K(a,x^ . . . ,{i R ,j R )) 

= / Pt r (s)P{T r+ x - T R >t 2 -T R \T R = s}ds 
Jti 

(20) = [ 2 P T R {s)eM-Ka,x (R) )(t2-s))ds 

Jti 

~PT R (t 2 ) / exp(-A(a,x (i?) )(t 2 -s))ds 
Jti 

PT R (t2) 

~ \{a,x( R )y 

The approximations are valid for large R and t 2 — t\, under boundedness con- 
ditions on the rate functions Aj. The first approximation in (20) is based on 
splitting the integration interval into two intervals (t±, t 2 — L) and (t 2 — L, t 2 ) 
for a bounded but large L; the first interval then gives an asymptotically 
negligible contribution and on the second interval Pt r (s) is approximately 
constant since var(Tp) = 0{R). The second approximation uses that t 2 — 1\ 
is large. Combining the preceding equations yields 

k(o>, x( 0) , (n, ji), . . . , (i R , j R )) 

(21) 

1 / -(h-ti- (l a ) 2 



: CXp 



2al 



A(a,x(«))v^ra2) 

This shows that, for observed data (x(ti),x(t 2 )) augmented by the sample 
path, the likelihood conditional on x{t\) can be expressed directly, either 
exactly or in a good approximation. 
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3.2. Missing data principle and stochastic approximation. An MCMC 
algorithm will be used that finds the ML estimator based on augmented 
data. Several methods for MCMC maximum likelihood estimation have 
been proposed in the literature. We shall use the Markov Chain Stochas- 
tic Approximation (MCSA) algorithm proposed by Gu and Kong (1998) 
and used (in a slightly different specification) also by Gu and Zhu (2001). 
This algorithm is based on the missing information principle of Orchard 
and Woodbury (1972) and Louis (1982) — going back to Fisher (1925); cf. 
Efron (1977). The principle can be summarized as follows. Suppose that x is 
observed, with probability distribution parameterized by 6 and having prob- 
ability density px( x ', &) w.r.t. some cr- finite measure. To facilitate estimation, 
the observed data is augmented by extra data v (regarded as missing data) 
such that the joint density is pxv ( x , w, Denote the observed data score 
function dlog(px(x; 9))/d9 by Sx{Q\x) and the total data score function 
dlog(pxv( x ,v;6))/d8 by Sxv(G]x,v). It is not hard to prove (see the cited 
literature) that 



This is the first part of the missing information principle. This equation 
implies that the likelihood equation can be expressed as 



and, therefore, ML estimates can be determined, under regularity conditions, 
as the solution of (23). 

This is applied in situations where the observed data score function Sxifi] x) 
is too difficult to calculate, while the total data score function Sxv(9] x ,v) 
is computable. In our case, we condition on X(t%), so this is treated as being 
fixed; we observe X = (Xfo), ■ ■ • ,X(£m)); and between each pair of consec- 
utive observations X(t m -i) and X(t m ) the data are augmented, as discussed 
in the previous section, by the sample path that could have brought the net- 
work from X(t m -i) to X(t m ). These sample paths combined for m = 2 to M 
constitute V. The following two subsections supply the additional elements 
for how the augmenting data are used. 

In the MCSA algorithm of Gu and Kong (1998), the solution of (23) is 
obtained by stochastic approximation [Robbins and Monro (1951); Kushner 
and Yin (2003)] which is defined by the updating step 



sequence, consists of positive numbers, tending to 0. The matrix D is a 
suitable matrix. It is efficient [see Kushner and Yin (2003)] to use a gain 



(22) 



E e {Sxv{0;x,V) \ X = x} = S x {9;x). 



(23) 



E e {Sxv(6;x,V)\X = x} = 0, 
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sequence tending to zero as ~ N~ c for a c < 1, and to estimate 9 not 
by the last element 6^ produced by the algorithm, but by a tail average 
(N — no + l)" 1 J2n=n 8^ ■ For fixed no and N — > oo, this average converges 
to the solution of (23) for a wide range of positive definite matrices D. 

The second part of the missing information principle [Orchard and Wood- 
bury (1972)] is that the observed Fisher information matrix for the observed 
data can be expressed as 

Dx(6) = -dS x (0;x)/d6 

(25) 

= Ee{D xv {6)\X = x}-Cov e {Sxv{0;x,V)\X = x}, 
where Dxv(&) is the complete data observed Fisher information matrix, 

(26) D XV (e) = -dS XV (0; x, V)/d9. 

Expression (25) can be interpreted loosely as "information is total (but 
partially unobserved) information minus missing information." This formula 
allows us to calculate standard errors. 

3.3. Simulating the sample path. The MCSA method relies on Monte 
Carlo simulations of the missing data V. The Markov property allows us 
to treat all periods (t m -i,t m ) separately and, therefore, we simplify the 
treatment and notation here by treating only one of those periods, tem- 
porarily assuming that M = 2. The missing data then is the sample path 
((Ji, Ji), . . . , (Ir, Jr)) which specifies the sequence of tie changes that brings 
the network from X(t\) to Xfo)- The advantage of having integrated out 
the time steps T r — with the use of the expressions (16) and (20) — is that 
less random noise is introduced, and the simulated variable V is discrete 
rather than including a Euclidean vector with a varying dimension R, which 
would require a more complicated MCMC procedure. 

The set of all sample paths connecting x(t\) and xfo) is the set of all finite 
sequences of pairs (i,j), i,j EM, where for i ^ j the parity of the number 
of occurrences of (i,j) is given by (13). Such sequences will be denoted by 
v = ((h,ji), ■ ■ ■ , (ir, Jr)), and the set of all these sequences is denoted V. 
The probability of the sample path conditional on X{t\) = x(ti),X(t2) = 
x{t2j is proportional to (14), rewritten here as 

R 

(27) K (a,x^,(h,j 1 ),...,(iR,j R )) x Hn^a^-^p^jM^), 

r=l 

where k is given by (16) or approximated by (21), depending on the specifica- 
tion of Aj . Draws from this distribution can be generated by the Metropolis- 
Hastings algorithm, provided that a proposal distribution is used which con- 
nects any two elements of V. 
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Such a proposal distribution will now be given, denoting the proposal 
probabilities by u(v\v) and the target probabilities, which are proportional to 
(27), by p(v). Then the acceptance probabilities in the Metropolis-Hastings 
algorithm, for a current state v and a proposed state v, are 



A proposal distribution is used that consists of small changes in v. The con- 
struction of the proposal distribution was based on the considerations that 
the proposal distribution should mix well in the set of all sample paths, 
and the Metropolis-Hastings ratios in (28) should be computable relatively 
easily. This led to proposal distributions consisting of the following types of 
small changes in v: 

1. Paired deletions. Of all pairs of indices ?r,r2 with (i ri ,jri) = {im jr?) i 
i ri T^Jn, one pair is randomly selected, and (i ri ,jn) an d (v 25 ir 2 ) are 
deleted from v. 

2. Paired insertions. A pair (i,j) € M 2 with i ^ j is randomly chosen, two 
indices ?T,r2 are randomly chosen, and the element is inserted im- 
mediately before r\ and before r<i- 

3. Single insertions. At a random place in the path (allowing beginning and 
end), the element (i,i) is inserted for a random i € M . 

4. Single deletions. Of all elements (i r ,jr) satisfying i r = j r , a randomly 
chosen one is deleted. 

5. Permutations. For randomly chosen n < r2, where T2 — r\ is bounded 
from above by some convenient number to avoid too lengthy calculations, 
the segment of consecutive elements (z r , j r ),r = r±, . . . ,V2 is randomly 
permuted. 

It is evident that these five operations yield new sequences within the per- 
mitted set V [cf. (13)]. Paired insertions and paired deletions are each others' 
inverse operations, and the same holds for single insertions and single dele- 
tions. These four types of operation together are sufficient for all elements of 
V to communicate. Permutations are added to achieve better mixing prop- 
erties. 

The detailed specification of how these elements are combined in the pro- 
posal distribution can be obtained from the authors. Two considerations 
guided this specification. In the first place, transparency of the algorithm. 
Paired insertions and paired deletions are not always unique inverses of 
each other. Nonuniqueness of the inverse operation would lead to compli- 
cated counting procedures to determine proposal probabilities. Therefore, 
the restriction is made that pairs of elements can only be inserted at, 
or deleted from a pair of positions in the sequence, if there are no occur- 
rences of the same (i,j) between these positions; and only if at least one 



(28) 
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other (i' if) (with %' ^ i or j' ^ j) occurs in between these positions. In this 
restricted set of operations, paired insertions and paired deletions are each 
other's unique inverses, which simplifies proposal probabilities. Due to the 
presence of permutations, all elements of the space V still are reachable from 
any element. 

The second consideration is computational efficiency. When proposing 
that a pair of elements (i,j) is inserted or deleted at certain positions, then 
also proposing to permute a segment between those positions entails no in- 
crease in computational load for calculating the Metropolis-Hastings ratios, 
and this permutation will lead to larger changes in v, and thereby, hopefully, 
better mixing for a given number of computations. 

3.4. Putting it together. This subsection combines the elements presented 
in the preceding subsections to define an algorithm for MCMC approxima- 
tion of the ML estimate. It is now assumed that an arbitrary number M > 2 
of repeated observations has been made: x(t±), xfe), ■ ■ ■ , x(tM)- As argued 
before, we condition on the first observation x(t±). The further data is de- 
noted by x = (x(t2), ■ ■ ■ ,x(tM))- The parameter (a,/3) is denoted by 6. 

The Markov property entails that the observed data score function, con- 
ditional on X(t\) = x(t\), can be decomposed as 

(29) 

M 



5 'x(t m )|X(t m _ 1 )(6 l ;^(im.)k(im-l)), 



m=2 

where Sx(t m )\x(t m -x) is the score function based on the conditional distri- 
bution of X(t m ), given X(t m _i). 

For the period from t m _i to t m , the data set is augmented by V m , which 
defines the order in which ties are changed between these time points, as 
described in Section 3.1; this can be denoted by 

V m = {(I m l,J m l), ■ • ■ , {ImR m ,JmR m )) 

with outcome v m . The augmenting variable as used in Section 3.1 is V = 
(V2, • • • , Vm)- Denote the probability (14) for the period from t m _i to t m by 
Pm(vm]6\x(t m -i)), and the corresponding total data score function by 

(30) S m {6]x{t m - 1 ),v m ) = — . 

From (22) and (29), and using the Markov property, it can be concluded 
that the observed data score function now can be written as 

Sx\x(t-i){Q'iX\x(ti)) 

(31) 

M 

= E e {S m (6; x(t m -i),V m )\X(t m -i) =x(t m -. 1 ),X(t m ) = x(t m )}. 

m=2 
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The ML estimate is obtained as the value of 9 for which (31) equals [cf. 
(23)]. 

The algorithm is iterative, and the iVth updating step now can be repre- 
sented as follows: 

1. For each m = 2, . . . , M, make a large number of the Metropolis-Hastings 
steps as described in Section 3.3, yielding i;W = (v% , ■ ■ ■ ,v^^). 

2. Compute 

M 
m=2 

using (30) with p m (v m ; 6\x(t m -i)) defined by (14) for the period from 
t m -\ to t m , and using (15) and (16) or (21). 

3. Update the provisional parameter estimate by 

§( N +V =9^ + a N D- 1 S xv (eW;xM N) ). 

As mentioned in Section 3.2, the estimate 9ml is calculated as a tail average 
of the values 9^ generated by this algorithm. The covariance matrix of the 
ML estimator is estimated using (25), where the expected values in the right- 
hand side are approximated by Monte Carlo simulation of the conditional 
distribution of V given X = x. for 9 = 9ml- This involves the matrix of 
partial derivatives (26), which can be estimated by a score- function method 
as elaborated in Schweinberger and Snijders (2006). 
The main implementation details are the following: 

a. The Method of Moments (MoM) estimator [Snijders (2001)], in practice, 
yields a good initial value 9^. 

b. To avoid long burn-in times for step (1.), the Metropolis-Hastings algo- 
rithm for generating V^ N ' can be started from the previous value, y( 7V " 1 ), 
rather than independently. This leads to some autocorrelation in the up- 
dates defined in step (3.), and the number of Metropolis-Hastings steps 
must be large enough that this autocorrelation is not too high. It may 
be noted that the Robbins Monro algorithm is robust to moderate de- 
pendence between successive updates [Kushner and Yin (2003)] . We have 
found good results when the number of steps is tuned so that the auto- 
correlations between the elements of 9^ are less than 0.3. 

c. For D in step (3.) we use a Monte Carlo estimate of the complete data 
observed Fisher information matrix (26), estimated for 9 = #W before 
making the iteration steps. 

d. For the other numerical parameters of the algorithm we use the same val- 
ues as described, for the stochastic approximation algorithm to compute 
the MoM estimator, in Snijders (2001). 
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This algorithm is implemented in Siena version 3.3 [Snijders et al. (2009)]. 
The executable program as well as the code can be found at 
http : //www . stats . ox . ac . uk/ siena/. 

4. Empirical example. As an illustrative example, the data set is used 
that was also analyzed in van de Bunt, van Duijn and Snijders (1999) and in 
Snijders (2001). This is a friendship network between 32 freshman students 
in a given discipline at a Dutch university. Initially most of the students 
were unknown to each other. There were six waves denoted t\ — t% of data 
collection, with 3 weeks between waves for the start of the academic year, 
and 6 weeks in between later. The relation studied is "being friends or close 
friends." The data set is obtainable from website http:/ /www. stats. ox. ac.uk/ 
siena/. 

The transitions between observations ti to t%, and t% to £4, will here be 
studied separately. To identify the rate function, we assume (arbitrarily but 
conveniently) that the duration of the time periods is unity, t% — 12 = £4 — £3 = 
1. To keep the model specification relatively simple, the rate function is 
supposed to be constant across actors, given by a\ from £2 to £3 and by 
«2 from £3 to £4; and the objective function (6) is chosen independent of 
the previous state x°, containing contributions of the effects of outdegree, 
number of reciprocated ties, number of transitive triplets, number of 3-cycles, 
gender of the sender of the tie ("ego"), gender of the receiver of the tie 
("alter"), and gender similarity: 



where variable Zi indicates the gender of actor i {F = 0, M = 1) and z its 
average over the 32 individuals. This model illustrates two types of triadic 
dependence (transitive triplets and 3-cycles) and the use of covariates (gen- 
der). 

For this model the parameters are estimated for the two transitions £2 — ^3 
and £3 — ti separately both by the Method of Moments (MoM) and by Max- 
imum Likelihood. For models where the functions used in (6) depend 
only on x and not on x°, so that they can be expressed as Sik(x), the MoM 
estimator as defined by Snijders (2001) is based on the vector with compo- 



nents \Xij(t m ) - Xij(t m -i)\ for m = 2, . . . , M and J2 m =2 J2i Sik(X(t m )) 



for k = 1, . . . , L. The MoM estimator is defined as the parameter vector for 





j j 
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which the observed and expected values of this statistic are equal, and can 
be determined by stochastic approximation. 

Both the moment equation and the likelihood equation can be repre- 
sented as EqS = 0, where S is the difference between observed and estimated 
moments or the complete-data score function, respectively. For both estima- 
tors, after running the stochastic approximation algorithm, convergence was 
checked by simulating the model for the obtained parameters for 2000 runs 
and calculating, for each component Sh of S, the ratio of the average simu- 
lated Sh to its standard deviation. In all cases, this ratio was less than 0.1, 
indicating adequate convergence. 

Parameter estimates and standard errors are reported in Table 1. Assum- 
ing that the estimators are approximately normally distributed (which is 
supported by the simulations reported in the next section, although we have 
no proof), the significance can be tested by referring the ratios of estimate to 
standard error to a standard normal distribution. The Method of Moments 
and Maximum Likelihood estimates are different but lead to the same sub- 
stantive conclusions. The parameters reflecting network structure, j3\ to /J4, 
give the same picture for both transitions. The negative f3± indicates that 
an outgoing tie which is not reciprocated and not transitively embedded 
is not considered attractive; the positive $2 and (3% indicate that there is 
evidence for tendencies toward reciprocation and transitivity of friendship 
choices, when controlling for all other effects in the model. For interpreting 
the 3-cycles effect, note that closed 3-cycles are structures denying a hierar- 
chically ordered relation. The negative $4 indicates a tendency away from 
closed 3-cycles, which — in conjunction with the positive transitive triplets 



Table 1 

Parameter estimates (Method of Moments and Maximum Likelihood) for data set of van 
de Bunt, van Duijn and Snijders (1999) 



Effect 




(t 2 , 


,t 3 ) 






(*3. 


,U) 




MoM 


ML 




MoM 


ML 




Est. 


S.E. 


Est. 


S.E. 


Est. 


S.E. 


Est. 


S.E. 


Rate function 


















a Rate parameter 


3.95 


0.64 


2.61 


0.36 


3.43 


0.59 


5.67 


0.75 


Objective function 


















j3\ Outdegree 


-1.66 


0.26 


-1.02 


0.26 


-2.19 


0.30 


-2.00 


0.22 


P2 Reciprocated ties 


2.06 


0.47 


1.94 


0.39 


2.26 


0.55 


1.70 


0.35 


Transitive triplets 


0.30 


0.06 


0.18 


0.06 


0.36 


0.07 


0.26 


0.05 


/?4 3-cycles 


-0.59 


0.23 


-0.42 


0.17 


-0.59 


0.27 


-0.31 


0.14 


/3s Gender alter 


0.28 


0.30 


0.14 


0.36 


0.70 


0.39 


0.61 


0.33 


/3@ Gender ego 


-0.34 


0.32 


-0.41 


0.41 


-0.04 


0.38 


-0.10 


0.33 


P7 Gender similarity 


0.33 


0.30 


0.34 


0.36 


0.48 


0.37 


0.44 


0.32 
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parameter — could be interpreted as a tendency toward a local (i.e., triadic) 
popularity hierarchy in friendship. For gender, there is only a close to sig- 
nificant effect of gender alter for the t^—t^ transition, suggesting that male 
students tend to be more popular as friends, when controlling for the other 
effects. 

From this single data example, of course no conclusions can be drawn 
concerning the relative value of these two estimation methods. 

5. Simulation examples. A comparison between the finite sample behav- 
ior of the MoM and the ML estimators can be based on simulations. Here 
we present a small simulation study as a very limited exploration of the rel- 
ative efficiency of the two estimators, which is expected to be in favor of the 
ML estimator. The limited nature of this simulation study does not allow 
generalization, but the study design is meant to be typical for applications 
to friendship networks in rather small groups, and replicates approximately 
the empirical study of the previous section. 

The model is identical to that of the previous section: there are three 
repeated observations, 32 actors, one binary covariate called gender, and 
the first observed network as well as the distribution of the covariate are 
identical to the van de Bunt data set at observation ti- Therefore, the ob- 
servation moments are again referred to as t2,t3,t^. The parameter values 
are rounded figures close to the estimates obtained in the preceding sec- 
tion. Networks for times £3 and £4 are generated and parameters estimated 
under the assumption that parameters 0k are the same in periods (£2^3) 
and (£3, £4). The simulation model has parameters a\ = 2.5 and 0L2 = 3.5, 
1 = -2, 2 = l,fo = 0.2, p A = 0,P 5 = 0.5, 06 = -0.25, and 7 = 0.5. A total 
of 1000 data sets were generated and the estimates calculated by both meth- 
ods. Table 2 reports the average estimates, the root mean squared errors, 
the rejection rates for testing the data-generating value of the parameter as 
the null hypothesis (estimating type-I error rates), and the rejection rates 
for testing that the parameters equal (estimating power), where the tests 
were two-sided tests based on the t-ratio for the corresponding parameter 
estimate, assuming a standard normal reference distribution, at a nominal 
significance level of 5%. 

Out of the 1000 generated data sets, 5 were excluded because they did 
not produce well converging results in the default specification of the al- 
gorithm for one or both estimators. Table 2 shows that the results for the 
two estimators are very similar, and the type-I error rates are close to the 
nominal value, except for the inflated type-I rates of the ML estimators for 
the two rate parameters. The latter is related to the skewed distribution 
of the rate parameter estimators. The correlations between the estimators 
are more than 0.93 for all coordinates; note that correlations are attenuated 
due to the stochastic nature of the algorithms. It can be concluded that for 
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Table 2 

Simulation results, 3 waves for 32 actors: average parameter estimates ("Ave"), root 
mean squared errors ("RMSE"), estimated type-I error rates ("a"), estimated power 

CP") 



Effect MoM estimator ML estimator 







Ave 


RMSE 


Q 


/3 


Ave 


RMSE 


Oi 


/3 


Rate function 


















ai =2.5 


Rate ti — ts 


2.46 


0.44 


0.063 




2.37 


0.44 


0.114 




o?2 = 3.5 


Rate tz — ti 


3.47 


0.55 


0.045 




3.39 


0.54 


0.099 




Objective function 


















Pi = -2.0 


Out degree 


-2.01 


0.15 


0.053 


1.00 


-1.96 


0.14 


0.047 


1.00 


p 2 = 1.0 


Reciprocation 


1.03 


0.26 


0.044 


0.96 


0.97 


0.26 


0.042 


0.95 


Pa = 0.2 


Transitivity 


0.186 


0.052 


0.043 


0.94 


0.180 


0.054 


0.064 


0.93 


p 4 = 0.0 


3-cycles 


0.02 


0.14 


0.042 




0.03 


0.14 


0.060 




Ps = 0.5 


Gender alter 


0.53 


0.24 


0.043 


0.65 


0.51 


0.25 


0.046 


0.57 


Pe, = -0.25 


Gender ego 


-0.28 


0.26 


0.062 


0.21 


-0.28 


0.27 


0.059 


0.20 


Pr = 0.5 


Gender sim. 


0.52 


0.24 


0.031 


0.64 


0.52 


0.25 


0.052 


0.60 



this type of model, characterized by 32 actors and 3 waves with rates of 
change 2.5 and 3.5, with 7 parameters in the objective function, MoM and 
ML estimation yield quite similar results. 

To explore data sets with less information, a similar simulation study 
was conducted with 20 actors, where the first network was induced by the 
*2 network of 20 of the actors in the van de Bunt data set, and the rest 
of the simulation design differed from the previous study by including an 
interaction effect of reciprocity by gender similarity, represented by the term 

(3%^XijXji{\ - \Zi - Zj\}, 

j 

with parameter (3$ = 0.5. This effect is included to achieve a higher correla- 
tion of the parameter estimators, which together with the smaller number 
of actors should lead to greater difficulties in estimation. 

The results are shown in Table 3. Of the 1000 generated data sets, 20 
were excluded from the results because of questionable convergence of the 
algorithm. The table shows that the ML estimator here performs clearly 
better than the MoM estimator. The estimated relative efficiency of MoM 
compared to ML expressed as ratio of mean squared errors ranges for these 
10 parameters from 0.50 to 0.99. The correlation between the estimators 
ranges from 0.71 (gender ego) to 0.98 (rate period 1). For all parameters 
except /?2) the estimated power of the test based on the ML estimator is 
higher than that of the MoM-based test, which can be traced back to the 
combination of a smaller mean squared error and a less conservative test. 
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Table 3 

Simulation results, 3 waves for 20 actors: average parameter estimates ("Ave"), root 
mean squared errors ("RMSE"), estimated type-I error rates ("a"), estimated power 

Cf3») 



Effect MoM estimator ML estimator 







Ave 


RMSE 


Q 


/3 


Ave 


RMSE 


a 


/3 


Rate function 


















ai = 2.5 


Rate ti — tz 


2.49 


0.50 


0.059 




2.37 


0.48 


0.093 




o?2 = 3.5 


Rate tz — ti 


3.51 


0.70 


0.044 




3.36 


0.67 


0.094 




Objective function 


















ft = -2.0 


Outdegree 


-2.09 


0.29 


0.009 


0.98 


-2.02 


0.21 


0.025 


1.00 


Pi = 1.0 


Reciprocation 


1.08 


0.35 


0.043 


0.86 


0.97 


0.33 


0.052 


0.82 


ft = 0.2 


Transitivity 


0.170 


0.099 


0.022 


0.51 


0.158 


0.096 


0.059 


0.52 


Pi = 0.0 


3-cycles 


0.01 


0.23 


0.034 




0.05 


0.23 


0.057 




ft = 0.5 


Gender alter 


0.59 


0.44 


0.020 


0.15 


0.60 


0.38 


0.048 


0.43 


ft = -0.25 


Gender ego 


-0.35 


0.50 


0.024 


0.00 


-0.34 


0.36 


0.034 


0.11 


ft = 0.5 


Gender sim. 


0.61 


0.50 


0.013 


0.12 


0.63 


0.43 


0.051 


0.35 


ft = 0.5 


G. sim. x rec. 


0.47 


0.63 


0.039 


0.10 


0.50 


0.60 


0.043 


0.14 



The surprisingly low power of the MoM-based test for the gender-ego effect 
reflects high standard errors that tend to be obtained for estimating (3§. 

6. Likelihood ratio tests. One of the important advantages of a likeli- 
hood approach is the possibility of elaborating model selection procedures. 
Here we only explain how to conduct a likelihood ratio test. 

A convenient way to estimate the likelihood ratio is by a simple imple- 
mentation of the idea of the path sampling method described by Gelman 
and Meng (1998), who took this method from the statistical physics liter- 
ature where it is known as thermodynamical integration. For arbitrary 9q 
and 6>i, defining the function 9(t) = t9\ + (1 - 1)6 , with 9 = 88/ dt = 6>i - 9 , 
this method is based on the equation 

(32) log( Px{x]dl) ) = f 9S x (x;9(t))dt. 

\Px{x;0Q)J Jo 

This integral can be approximated by replacing the integral by a finite sum 
and using (22). Applying a simulation procedure which for each 9(h/H), 
h = 0, . . . , H, generates L > 1 draws Vhi from the conditional distribution of 
V given X = x under parameter 9(h/H), the log likelihood ratio (32) can 
be approximated by 

L H 

(33) L(H + \ £ £ Sxv(0(h/H);x, V M ). 

^ ~ l ~ > i=\ h=0 



SOCIAL NETWORK DYNAMICS 



21 



Burn-in time can be considerably reduced when starting the MCMC algo- 
rithm for generating Vhi by the end result of the algorithm used to generate 

V(h-1),L- 

This was applied to the van de Bunt data set used in Section 4 to test the 
null hypothesis that all parameters Pi,. .. ,/3f are for period (£3, £4) the same 
as for (t2,ts) against the alternative that they are allowed to be different. 
The rate parameters a were allowed to be different under the null as well as 
the alternative hypothesis. For calculating (33) we used L = 10, H = 1000. 
The estimated likelihood ratio was 18.7, which for a X7 distribution yields 
p < 0.01, leading to rejecting the null hypothesis at conventional levels of 
significance. 

7. Discussion. This article has presented a model for longitudinal net- 
work data collected in a panel design and an algorithm for calculating the 
ML estimator. The model can represent triadic and other complicated de- 
pendencies that are characteristic for social networks. It is designed for ap- 
plications in social network analysis, but related models may be applied for 
modeling networks in other sciences, such as biology. Earlier, a method of 
moments (MoM) estimator for this model was proposed by Snijders (2001) 
and Bayesian inference methods by Koskinen and Snijders (2007). The al- 
gorithm was constructed using stochastic approximation according to the 
approach proposed by Gu and Kong (1998), and employs Monte Carlo sim- 
ulations of the unobserved changes between the panel waves, conditional on 
the observed data. The simulation design used is more efficient than that 
used in Koskinen and Snijders (2007) because here the waiting times between 
unobserved changes are integrated out. 

No proof is yet available for the consistency and asymptotic normality of 
the ML estimator, which are intuitively plausible for the situation where n 
tends to infinity, t\ and ti are fixed, and the parameters are such that the av- 
erage degree n~ l ^2n jXij tends to a positive finite limit. Limited simulation 
results do support the expectation that the estimators are asymptotically 
normal. The nonstandard assumptions (lack of independence) imply, how- 
ever, that a proof may be expected to be rather complicated. 

Our experience in the reported simulations and in working with empirical 
data sets is that the algorithm converges well unless data sets are small given 
the number of estimated parameters, but the algorithm is time-consuming 
(e.g., each ML estimation for one data set in Table 2 took about 35 minutes 
on a regular personal computer, while each MoM estimation took about 
2 minutes). Further improvements in the algorithm are desirable for the 
practical use of ML estimators in these models. This is the subject of future 
work. 

Judging from our very limited simulations, the advantages of ML estima- 
tion over MoM estimation in terms of root mean squared error and power 
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of the associated tests seem strong for small data sets and small for medium 
to large data sets. Further simulation studies are necessary. However, even 
if the statistical efficiency is similar, likelihood-based estimation can have 
additional advantages, for example, the possibility of extensions to more 
complicated models and of elaborating model selection procedures. 

Software implementing the procedures in this paper is available at 
http : //www . stats . ox . ac . uk/ siena/. 
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